Dataset statistics
| Number of variables | 8 |
|---|---|
| Number of observations | 908 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 56.9 KiB |
| Average record size in memory | 64.1 B |
Variable types
| NUM | 8 |
|---|
df_index has unique values | Unique |
SM1_Dz(z) has 36 (4.0%) zeros | Zeros |
NdsCH has 760 (83.7%) zeros | Zeros |
NdssC has 622 (68.5%) zeros | Zeros |
Reproduction
| Analysis started | 2022-08-25 09:28:47.774150 |
|---|---|
| Analysis finished | 2022-08-25 09:29:06.887163 |
| Duration | 19.11 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 908 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 454.5 |
|---|---|
| Minimum | 1 |
| Maximum | 908 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.1 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 46.35 |
| Q1 | 227.75 |
| median | 454.5 |
| Q3 | 681.25 |
| 95-th percentile | 862.65 |
| Maximum | 908 |
| Range | 907 |
| Interquartile range (IQR) | 453.5 |
Descriptive statistics
| Standard deviation | 262.2613201 |
|---|---|
| Coefficient of variation (CV) | 0.5770326074 |
| Kurtosis | -1.2 |
| Mean | 454.5 |
| Median Absolute Deviation (MAD) | 227 |
| Skewness | 0 |
| Sum | 412686 |
| Variance | 68781 |
| Monotocity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 1 | 1 | 0.1% | |
| 625 | 1 | 0.1% | |
| 599 | 1 | 0.1% | |
| 600 | 1 | 0.1% | |
| 601 | 1 | 0.1% | |
| 602 | 1 | 0.1% | |
| 603 | 1 | 0.1% | |
| 604 | 1 | 0.1% | |
| 605 | 1 | 0.1% | |
| 606 | 1 | 0.1% | |
| Other values (898) | 898 | 98.9% |
| Value | Count | Frequency (%) | |
| 1 | 1 | 0.1% | |
| 2 | 1 | 0.1% | |
| 3 | 1 | 0.1% | |
| 4 | 1 | 0.1% | |
| 5 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 908 | 1 | 0.1% | |
| 907 | 1 | 0.1% | |
| 906 | 1 | 0.1% | |
| 905 | 1 | 0.1% | |
| 904 | 1 | 0.1% |
CICO
Real number (ℝ≥0)
| Distinct | 502 |
|---|---|
| Distinct (%) | 55.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.898128855 |
|---|---|
| Minimum | 0.667 |
| Maximum | 5.926 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.1 KiB |
Quantile statistics
| Minimum | 0.667 |
|---|---|
| 5-th percentile | 1.7754 |
| Q1 | 2.347 |
| median | 2.934 |
| Q3 | 3.407 |
| 95-th percentile | 4.16925 |
| Maximum | 5.926 |
| Range | 5.259 |
| Interquartile range (IQR) | 1.06 |
Descriptive statistics
| Standard deviation | 0.7560884791 |
|---|---|
| Coefficient of variation (CV) | 0.2608884963 |
| Kurtosis | -0.04150551962 |
| Mean | 2.898128855 |
| Median Absolute Deviation (MAD) | 0.5285 |
| Skewness | 0.04545787455 |
| Sum | 2631.501 |
| Variance | 0.5716697882 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 2.126 | 14 | 1.5% | |
| 3.08 | 11 | 1.2% | |
| 2.377 | 9 | 1.0% | |
| 2.834 | 7 | 0.8% | |
| 2.08 | 7 | 0.8% | |
| 3.252 | 7 | 0.8% | |
| 2.508 | 7 | 0.8% | |
| 2.479 | 7 | 0.8% | |
| 3.179 | 6 | 0.7% | |
| 2.216 | 6 | 0.7% | |
| Other values (492) | 827 | 91.1% |
| Value | Count | Frequency (%) | |
| 0.667 | 2 | 0.2% | |
| 0.965 | 3 | 0.3% | |
| 0.973 | 1 | 0.1% | |
| 1 | 3 | 0.3% | |
| 1.075 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 5.926 | 1 | 0.1% | |
| 5.158 | 1 | 0.1% | |
| 4.88 | 1 | 0.1% | |
| 4.829 | 1 | 0.1% | |
| 4.81 | 1 | 0.1% |
| Distinct | 186 |
|---|---|
| Distinct (%) | 20.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6284680617 |
|---|---|
| Minimum | 0 |
| Maximum | 2.171 |
| Zeros | 36 |
| Zeros (%) | 4.0% |
| Memory size | 7.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.134 |
| Q1 | 0.223 |
| median | 0.57 |
| Q3 | 0.89275 |
| 95-th percentile | 1.4396 |
| Maximum | 2.171 |
| Range | 2.171 |
| Interquartile range (IQR) | 0.66975 |
Descriptive statistics
| Standard deviation | 0.4284590914 |
|---|---|
| Coefficient of variation (CV) | 0.6817515759 |
| Kurtosis | -0.117152754 |
| Mean | 0.6284680617 |
| Median Absolute Deviation (MAD) | 0.347 |
| Skewness | 0.6950900326 |
| Sum | 570.649 |
| Variance | 0.183577193 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0.223 | 135 | 14.9% | |
| 0.134 | 74 | 8.1% | |
| 0.405 | 69 | 7.6% | |
| 0.331 | 39 | 4.3% | |
| 0 | 36 | 4.0% | |
| 0.693 | 26 | 2.9% | |
| 0.56 | 25 | 2.8% | |
| 0.496 | 24 | 2.6% | |
| 0.251 | 21 | 2.3% | |
| 0.58 | 16 | 1.8% | |
| Other values (176) | 443 | 48.8% |
| Value | Count | Frequency (%) | |
| 0 | 36 | 4.0% | |
| 0.134 | 74 | 8.1% | |
| 0.223 | 135 | 14.9% | |
| 0.251 | 21 | 2.3% | |
| 0.288 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 2.171 | 1 | 0.1% | |
| 2.071 | 1 | 0.1% | |
| 2.044 | 1 | 0.1% | |
| 1.86 | 1 | 0.1% | |
| 1.834 | 1 | 0.1% |
GATS1i
Real number (ℝ≥0)
| Distinct | 557 |
|---|---|
| Distinct (%) | 61.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.29359141 |
|---|---|
| Minimum | 0.396 |
| Maximum | 2.92 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.1 KiB |
Quantile statistics
| Minimum | 0.396 |
|---|---|
| 5-th percentile | 0.799 |
| Q1 | 0.95075 |
| median | 1.2405 |
| Q3 | 1.56225 |
| 95-th percentile | 2.00985 |
| Maximum | 2.92 |
| Range | 2.524 |
| Interquartile range (IQR) | 0.6115 |
Descriptive statistics
| Standard deviation | 0.394302736 |
|---|---|
| Coefficient of variation (CV) | 0.3048124261 |
| Kurtosis | 0.3268399411 |
| Mean | 1.29359141 |
| Median Absolute Deviation (MAD) | 0.2995 |
| Skewness | 0.7231072608 |
| Sum | 1174.581 |
| Variance | 0.1554746476 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0.941 | 8 | 0.9% | |
| 1.179 | 7 | 0.8% | |
| 0.938 | 7 | 0.8% | |
| 0.954 | 7 | 0.8% | |
| 0.871 | 7 | 0.8% | |
| 1.189 | 6 | 0.7% | |
| 1.6 | 6 | 0.7% | |
| 1.571 | 6 | 0.7% | |
| 1.077 | 5 | 0.6% | |
| 0.95 | 5 | 0.6% | |
| Other values (547) | 844 | 93.0% |
| Value | Count | Frequency (%) | |
| 0.396 | 1 | 0.1% | |
| 0.421 | 1 | 0.1% | |
| 0.523 | 1 | 0.1% | |
| 0.595 | 1 | 0.1% | |
| 0.618 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 2.92 | 1 | 0.1% | |
| 2.698 | 1 | 0.1% | |
| 2.672 | 1 | 0.1% | |
| 2.609 | 1 | 0.1% | |
| 2.606 | 1 | 0.1% |
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.2290748899 |
|---|---|
| Minimum | 0 |
| Maximum | 4 |
| Zeros | 760 |
| Zeros (%) | 83.7% |
| Memory size | 7.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 4 |
| Range | 4 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.6053349942 |
|---|---|
| Coefficient of variation (CV) | 2.642520071 |
| Kurtosis | 13.81346349 |
| Mean | 0.2290748899 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.400814701 |
| Sum | 208 |
| Variance | 0.3664304552 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=5)
| Value | Count | Frequency (%) | |
| 0 | 760 | 83.7% | |
| 1 | 107 | 11.8% | |
| 2 | 29 | 3.2% | |
| 4 | 7 | 0.8% | |
| 3 | 5 | 0.6% |
| Value | Count | Frequency (%) | |
| 0 | 760 | 83.7% | |
| 1 | 107 | 11.8% | |
| 2 | 29 | 3.2% | |
| 3 | 5 | 0.6% | |
| 4 | 7 | 0.8% |
| Value | Count | Frequency (%) | |
| 4 | 7 | 0.8% | |
| 3 | 5 | 0.6% | |
| 2 | 29 | 3.2% | |
| 1 | 107 | 11.8% | |
| 0 | 760 | 83.7% |
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4856828194 |
|---|---|
| Minimum | 0 |
| Maximum | 6 |
| Zeros | 622 |
| Zeros (%) | 68.5% |
| Memory size | 7.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.8612789371 |
|---|---|
| Coefficient of variation (CV) | 1.773336224 |
| Kurtosis | 6.423355202 |
| Mean | 0.4856828194 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.239090332 |
| Sum | 441 |
| Variance | 0.7418014076 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) | |
| 0 | 622 | 68.5% | |
| 1 | 176 | 19.4% | |
| 2 | 81 | 8.9% | |
| 3 | 18 | 2.0% | |
| 4 | 8 | 0.9% | |
| 6 | 2 | 0.2% | |
| 5 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 622 | 68.5% | |
| 1 | 176 | 19.4% | |
| 2 | 81 | 8.9% | |
| 3 | 18 | 2.0% | |
| 4 | 8 | 0.9% |
| Value | Count | Frequency (%) | |
| 6 | 2 | 0.2% | |
| 5 | 1 | 0.1% | |
| 4 | 8 | 0.9% | |
| 3 | 18 | 2.0% | |
| 2 | 81 | 8.9% |
MLOGP
Real number (ℝ)
| Distinct | 559 |
|---|---|
| Distinct (%) | 61.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.109285242 |
|---|---|
| Minimum | -2.884 |
| Maximum | 6.515 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.1 KiB |
Quantile statistics
| Minimum | -2.884 |
|---|---|
| 5-th percentile | -0.317 |
| Q1 | 1.209 |
| median | 2.127 |
| Q3 | 3.105 |
| 95-th percentile | 4.48745 |
| Maximum | 6.515 |
| Range | 9.399 |
| Interquartile range (IQR) | 1.896 |
Descriptive statistics
| Standard deviation | 1.433180788 |
|---|---|
| Coefficient of variation (CV) | 0.6794627676 |
| Kurtosis | 0.006986919075 |
| Mean | 2.109285242 |
| Median Absolute Deviation (MAD) | 0.937 |
| Skewness | -0.03519131625 |
| Sum | 1915.231 |
| Variance | 2.054007172 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 1.701 | 10 | 1.1% | |
| 0.8 | 10 | 1.1% | |
| 0.202 | 9 | 1.0% | |
| 1.064 | 9 | 1.0% | |
| 2.604 | 9 | 1.0% | |
| 1.748 | 9 | 1.0% | |
| 1.442 | 8 | 0.9% | |
| 2.193 | 8 | 0.9% | |
| 1.587 | 8 | 0.9% | |
| 1.859 | 7 | 0.8% | |
| Other values (549) | 821 | 90.4% |
| Value | Count | Frequency (%) | |
| -2.884 | 1 | 0.1% | |
| -2.089 | 1 | 0.1% | |
| -2.03 | 1 | 0.1% | |
| -1.96 | 1 | 0.1% | |
| -1.358 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 6.515 | 1 | 0.1% | |
| 6.203 | 1 | 0.1% | |
| 6.166 | 1 | 0.1% | |
| 5.934 | 1 | 0.1% | |
| 5.741 | 1 | 0.1% |
LC50[-LOG(mol/L)]
Real number (ℝ≥0)
| Distinct | 827 |
|---|---|
| Distinct (%) | 91.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.064430617 |
|---|---|
| Minimum | 0.053 |
| Maximum | 9.612 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.1 KiB |
Quantile statistics
| Minimum | 0.053 |
|---|---|
| 5-th percentile | 1.68385 |
| Q1 | 3.15175 |
| median | 3.9875 |
| Q3 | 4.9075 |
| 95-th percentile | 6.48365 |
| Maximum | 9.612 |
| Range | 9.559 |
| Interquartile range (IQR) | 1.75575 |
Descriptive statistics
| Standard deviation | 1.455698446 |
|---|---|
| Coefficient of variation (CV) | 0.3581555655 |
| Kurtosis | 0.6638471981 |
| Mean | 4.064430617 |
| Median Absolute Deviation (MAD) | 0.867 |
| Skewness | 0.2521384377 |
| Sum | 3690.503 |
| Variance | 2.119057965 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 4.208 | 4 | 0.4% | |
| 3.513 | 4 | 0.4% | |
| 3.47 | 3 | 0.3% | |
| 3.979 | 3 | 0.3% | |
| 3.926 | 3 | 0.3% | |
| 3.66 | 3 | 0.3% | |
| 4.54 | 2 | 0.2% | |
| 3.751 | 2 | 0.2% | |
| 3.841 | 2 | 0.2% | |
| 4.499 | 2 | 0.2% | |
| Other values (817) | 880 | 96.9% |
| Value | Count | Frequency (%) | |
| 0.053 | 1 | 0.1% | |
| 0.15 | 1 | 0.1% | |
| 0.242 | 1 | 0.1% | |
| 0.33 | 1 | 0.1% | |
| 0.361 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 9.612 | 1 | 0.1% | |
| 9.354 | 1 | 0.1% | |
| 8.916 | 1 | 0.1% | |
| 8.604 | 1 | 0.1% | |
| 8.571 | 1 | 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | CICO | SM1_Dz(z) | GATS1i | NdsCH | NdssC | MLOGP | LC50[-LOG(mol/L)] | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3.260 | 0.829 | 1.676 | 0 | 1 | 1.453 | 3.770 |
| 1 | 2 | 2.189 | 0.580 | 0.863 | 0 | 0 | 1.348 | 3.115 |
| 2 | 3 | 2.125 | 0.638 | 0.831 | 0 | 0 | 1.348 | 3.531 |
| 3 | 4 | 3.027 | 0.331 | 1.472 | 1 | 0 | 1.807 | 3.510 |
| 4 | 5 | 2.094 | 0.827 | 0.860 | 0 | 0 | 1.886 | 5.390 |
| 5 | 6 | 3.222 | 0.331 | 2.177 | 0 | 0 | 0.706 | 1.819 |
| 6 | 7 | 3.179 | 0.000 | 1.063 | 0 | 0 | 2.942 | 3.947 |
| 7 | 8 | 3.000 | 0.000 | 0.938 | 1 | 0 | 2.851 | 3.513 |
| 8 | 9 | 2.620 | 0.499 | 0.990 | 0 | 0 | 2.942 | 4.402 |
| 9 | 10 | 2.834 | 0.134 | 0.950 | 0 | 0 | 1.591 | 3.021 |
Last rows
| df_index | CICO | SM1_Dz(z) | GATS1i | NdsCH | NdssC | MLOGP | LC50[-LOG(mol/L)] | |
|---|---|---|---|---|---|---|---|---|
| 898 | 899 | 3.599 | 0.702 | 1.514 | 2 | 1 | 3.247 | 6.183 |
| 899 | 900 | 2.986 | 0.961 | 1.669 | 0 | 4 | 1.798 | 3.152 |
| 900 | 901 | 2.804 | 1.110 | 0.618 | 0 | 6 | 1.317 | 6.254 |
| 901 | 902 | 3.670 | 0.728 | 2.110 | 0 | 3 | 2.288 | 2.964 |
| 902 | 903 | 3.475 | 0.405 | 0.875 | 1 | 2 | 3.148 | 4.803 |
| 903 | 904 | 2.801 | 0.728 | 2.226 | 0 | 2 | 0.736 | 3.109 |
| 904 | 905 | 3.652 | 0.872 | 0.867 | 2 | 3 | 3.983 | 4.040 |
| 905 | 906 | 3.763 | 0.916 | 0.878 | 0 | 6 | 2.918 | 4.818 |
| 906 | 907 | 2.831 | 1.393 | 1.077 | 0 | 1 | 0.906 | 5.317 |
| 907 | 908 | 4.057 | 1.032 | 1.183 | 1 | 3 | 4.754 | 8.201 |